Clustering searchable documents is useful for eliminating duplicate content and showing greater diversity within search results. One of the more challenging aspects of clustering web related documents (e.g., web pages, images and videos) is that conventional methods produce clusters that include too many documents or documents that are duplicated or substantially duplicated. A need therefore exists for improved methods and systems for clustering documents.